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ABSTRACT 

Xanthusbase (http://www.xanthusbase.org), a model 
organism database for the bacterium Myxococcus 
xanthus, functions as a collaborative information 
repository based on Wikipedia principles. It was 
created more than 5 years ago to serve as a 
cost-effective reference database for M. xanthus 
researchers, an education tool for undergraduate 
students to learn about genome annotation, and a 
means for the community of researchers to collab- 
oratively improve their organism's annotation. We 
have achieved several goals and are seeking 
creative solutions to ongoing challenges. Along the 
way we have made several important improvements 
to Xanthusbase related to stability, security and 
usability. Most importantly, we have designed and 
implemented an installer that enables other micro- 
bial model organism communities to use it as a 
MOD. This version, called Openmods, has already 
been used to create Xenorhabdusbase (http:// 
xenorhabdusbase.bact.wisc.edu), Caulobacterbase 
(http://caulobacterbase.bsd.uchicago.edu) and soon 
Bdellovibriobase. 



BACKGROUND: XANTHUSBASE AS A WIKI-STYLE 
COLLABORATIVE INFORMATION REPOSITORY 

Wikis were first envisioned as collaborative information 
repositories (CIR) for the distributed annotation and 
re-annotation of genomes in model organism databases 
(MODs) more than 10 years ago (1). The idea held 
enormous promise. An online wiki-style CIR MOD 
offers the potential to solve three challenges confronted 
by many smaller model organism communities in the 
post-genomic era (2). 

The first is the ongoing dependence of biological data- 
bases, including MODs, on constant federal support for 



proper maintenance and curation (3). As databases 
proliferated over the past decade, many reached critical 
cross-roads: funding organizations that supported their 
design and implementation were anxious to wean them 
off the federal funding stream for maintenance and 
updates. Wikis were seen as one possible solution. 

The second challenge is to develop an effective means to 
engage more undergraduate students in real genomics 
research. Undergraduates represent a large and enthusias- 
tic group that can make meaningful advances in many 
areas of science (4), and genome annotation provides an 
excellent opportunity to realize this potential (5). This 
approach was first used to annotate the genome of the 
plant pathogen Agrobacterium tumefaciens C58 (6). Since 
then, there have been several additional annotation 
projects involving undergraduate students (5,7-9). Given 
these successes, it is somewhat surprising that the process 
has not been more widely adopted. The projects serve as a 
mechanism for undergraduate students to interact with 
researchers, contribute to genome annotations and see 
the impact of those contributions. 

The third challenge faced by model organism 
communities is that, more often than not, researchers 
who study the model organism perceive that they have 
little or no direct participation in the annotation of its 
genome. Errors in the Genbank file for an organism's 
genome persist, sometimes for years, even though any 
member of that organism's research community could 
correct it with a few simple edits. Wikis were seen as a 
means of directly engaging a research community in the 
continual process of re-annotation, thereby offering a 
path to more accurate and up-to-date genomes (10). 

Our original goal with Xanthusbase was to address each 
of these three problems. In 2005, at the 32nd International 
Conference on the Biology of the Myxobacteria, the 
research community that studies Myxococcus xanthus ex- 
pressed unanimous support for the creation of an access- 
ible, inexpensive wiki-style CIR MOD, and several 
Principle Investigators (Pis) also expressed interest in the 
educational potential of such a MOD. From its first 
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iteration, the Xanthusbase user interface was designed to 
be familiar to anyone who has used a MOD. The genome 
annotation was parsed into genepages, each of which rep- 
resented one open reading frame (ORF). Genepages were 
arranged in order along the genome, which was repre- 
sented as a single circular chromosome. The feature that 
made Xanthusbase different from other databases was its 
editing feature. Any registered user could update the an- 
notation by editing a genepage, with the following three 
caveats: all edits would be identified by author; all edits 
would be saved forever; and any user could return to a 
previous edit with a single mouse click. These three 
caveats were what made Xanthusbase 'wiki-style', since 
they are common to any wiki. 

There are larger and more comprehensive databases 
that include information about the M. xanthus genome, 
such as NCBI (11), JCVI (http://www.jcvi.org/), BioCyc 
(12) and many others. Xanthusbase aggregates informa- 
tion from some of them, such as Pfam (13), KEGG (14) 
and GO (15), and reproduces some of the features of 
others, but its function as an autonomous 
community-owned and operated wiki-style CIR MOD 
remains unique. The continuous support of the commu- 
nity of researchers who study M. xanthus as a model 
organism, as well as interest from other small model 
organism communities, indicates that this system can 
play a unique role in supporting research and education. 



FIVE-YEAR UPDATE 

Since the original publication, we have made major im- 
provements, many of which are not immediately obvious. 
We have completely rewritten the database software. The 
previous version required manual installation onto a 
server, but this made it difficult to create exact replica 
test servers for update development. Therefore, we 



created an automatic installer to alleviate maintenance 
problems. The resulting installer and website software 
were general enough that it can be used to create a 
MOD for any bacterium that: (i) has a single circular 
chromosome, (ii) has an existing Genbank file and (iii) is 
listed in KEGG. Since the software is no longer specific to 
M. xanthus we renamed it Openmods; Xanthusbase now 
refers to the installation of Openmods for M. xanthus. 

An Openmods installation requires a dedicated 
computer or virtual machine that runs Debian or 
Ubuntu and a dedicated network connection. The proced- 
ure is to execute two simple commands. The first installs 
the software dependencies that Openmods requires using 
the automated software installer present in Debian and 
Ubuntu (apt). The second command asks the user for 
some information (using debconf) about the specific in- 
stallation. After about 10 min the website has acquired 
most of the information from external sources and is 
ready to be viewed. It takes about a week for the 
website to complete its first update of all information 
from external sources. Figure 1 gives a high level view of 
how information enters and exits the database. 

Three steps are required from the user to begin an 
Openmods installation. The first is to set up the header; 
the user must provide a website name, PI information and 
institution logo. The top of Figure 2 shows the header of 
the Openmods installation specific to Xanthusbase. On the 
left hand side there is the text: 'XANTHUSBASE' with 
the subtitle: 'Myxococcus xanthus DK1622 and related 
bacteria'. This is a graphic that is automatically created 
by a Java program in the installer. On the right hand side 
there is PI contact information and the institution logo. 
After installation, this information can be updated using a 
command (Sopenmods configure). 

The next step is organism specification. The user is 
required to enter three pieces of information: the FTP 
location of the Genbank file, the KEGG organism 
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Figure 1. Openmods data sources and sinks. Motifs and Pathways are pulled using the KEGG SOAP API. Blast on Genes and Blast2GO are used 
to produce GO terms. The Genbank file is automatically downloaded via FTP, and the following information is then parsed from it: Locus, 
Coordinates. Length, Strand, Gene Name, Gene Products, Note and Sequence. Users are allowed to edit most of the fields, but Gene Name, 
Gene Products and Note are the only ones saved to the updated Genbank file. All data from the website is saved to the automatic backup. 
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Figure 2. Openmods genepage screenshot. From top to bottom, the genepage contains: Geneome browser, Locus (static), Coordinates (static), Gene 
Length (static), Strand (static), Gene Name (editable). Gene Products (editable), Note (editable), Motifs (periodically updated but not editable), 
Pathways (periodically updated but not editable), GO information (periodically updated but not editable), Mutant Strains (editable), Protein 
Synonyms (editable), Gene Synonyms (editable), Mutant Phenotypes (editable) and Community Description (editable). The genepage also 
contains the protein and nucleotide sequence but is not shown due to space constraints. The Gene Name, Gene Products and Note are included 
in Genbank file updates. 
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abbreviation and a name for the organism to be displayed 
in the 'Select Organism' dropdown menu. After the first 
organism is specified the installer continues onto the data 
backup section, but additional organisms can be installed 
on an Openmods installation using the command 
(Sopenmods new_organism). Organisms can also be 
deleted using (Sopenmods delete_organism) 

The third step is for the user to specify how data backup 
will be performed. One of two options for automatic 
backup must be selected. The first option is to backup 
using a USB flash drive. If this option is selected, the 
user should insert a blank, 4 GB (minimum) flash drive 
into any USB port on the computer. Once a week the 
automatic backup daemon will run and detect any flash 
drives plugged into the system and push backups there. 
Even if the entire computer hosting an Openmods instal- 
lation is destroyed beyond repair, everything needed to 
recreate an installation is present on the flash drive, 
including usernames and user annotations. The procedure 
to install on a new computer from backup is: install a fresh 
copy of Openmods, do not specify an organism and run 
the command (Sopenmods import <backup_filename>). 
A backup can be triggered outside of the routine by 
running a command (Sopenmods backup). The second 
option is to send backups to a remote SFTP server. This 
is useful if Openmods is installed on a virtual machine 
where the administrator has no physical access, and there- 
fore cannot plug in a flash drive. The administrator can 
change these settings at any time using the command 
(Sopenmods backup_setup). 

Information is still organized according to genepages, 
where each genepage represents one ORF, and genepages 
are organized by their location in the genome. Figure 2 
shows a representative genepage, where information can 
be grouped into sections from top to bottom. The first 
section is the Genome Browser, which is a clickable 
image showing nearby ORFs. A user can click an ORF 
in the Genome Browser and go to its genepage, or can 
scroll left or right along the chromosome by clicking on 
the left or right arrows The Genome Browser was de- 
veloped from scratch because existing genome browsers, 
specifically, GBrowse (http://gmod.org/wiki/GBrowse) 
require user intervention to install and do not integrate 
with the Openmods installer. 

The second section contains data from the Genbank file. 
This section contains ORF Locus, Coordinates, Length, 
Strand, Gene Name, Gene Products and Note; Gene 
Name, Gene Products and Note are the only editable 
fields in this section. The third section contains Motif, 
Pathway and Gene Ontology data; none of this informa- 
tion is editable by the user. Motif and Pathway informa- 
tion is updated automatically from KEGG once a month, 
and the Gene Ontology information is updated using 
Blast2Go, also once a month. The fourth section 
contains fields that allow researchers to register mutant 
strains (suggested by the M. xanthus Community at the 
38th Annual International Conference on the Biology of 
Myxobacteria). The fifth section contains information that 
is not in the Genbank file, but may be useful to the com- 
munity, such as Protein Synonyms, Gene Synonyms, 
Mutant Phenotypes and Community Description. 



Although none of the fields in this section are written 
out to an updated Genbank file, they are saved during 
backup. 

It is not necessary to register in order to see 
Xanthusbase and browse the annotation; however, those 
who wish to add information must register. Access and 
security are managed by designating four types of users 
with different permission levels: blocked user, student, re- 
searcher and principle investigator. A person who registers 
with Xanthusbase will be a blocked user until their per- 
mission level is changed (to prevent web bots from des- 
troying the integrity of the database). Users with 
permission level 'student' may contribute changes to the 
annotation, but these changes are marked as pending until 
approved by a user with permission level 'researcher' or 
'principle investigator'. Users with permission level 're- 
searcher' my make changes to the annotation without 
approval, and may approve or deny any suggested 
student-level changes. Users with permission level 'prin- 
ciple investigator' may do everything a researcher can do, 
and in addition can delete users, change their permission 
level, create groups of users according to laboratories or 
classes, and download a group's media (pdf s, images and 
videos) as a zip file with one folder per user in the group. 
This last feature is useful for grading. Also, only a 're- 
searcher' or 'principle investigator' can register a mutant 
strain. 

'Principle investigator' and 'researcher' contributions, 
as well as student proposed contributions, are visible for 
each ORF on its respective genepage, and they are also 
compiled on a list of recent changes. Although the entire 
history of changes is stored in the database in perpetuity, a 
user has the ability to immediately undo a change in the 
event of an input error. A Genbank file for the organism 
that incorporates all of the most recent changes can be 
downloaded using the command (Sopenmods 
genbank_out). 

The discussion examines Openmods as a solution to 
each of the three problems stated in the introduction, 
using Xanthusbase as an example. 



ADDRESSING PROBLEM 1: Openmods AS AN 
FUNCTIONAL AND COST-EFFECTIVE MOD 

Openmods has succeeded in becoming a functional and 
cost-effective MOD. It depends on only free software, is 
itself entirely free and open source, and can be installed on 
inexpensive hardware. It is built on PHP, Java, MySQL 
and the Joomla Framework. Joomla is used to allow ad- 
ministrators to change user's passwords and the static text 
on the site. For hardware, we recommend a computer with 
at least one 32-bit CPU core and 2 GB of RAM. 

We have tested the current version of Openmods to 
ensure its reliability. Over the past year, Xanthusbase 
has been running 24 x 7 on the hardware configuration 
specified and received between 20 and 100 unique 
visitors per day. It can handle loads of at least 30 users 
simultaneously (all of the students in our class have sim- 
ultaneously logged in and used the website). We currently 
have 74 registered uses and over 7000 annotations. 
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Significant efforts have been made to ensure that the 
database remains secure. In all cases where content is 
uploaded according to a user level, Openmods conducts 
server-side checks for permissions. The site protects 
against SQL-injection attacks for all data that can touch 
an SQL query. It also protects against command line 
escaping when calling Java-based programs for backend 
processing. User content is filtered for JavaScript injec- 
tion or cross-site scripting. User passwords are stored 
as MD5 hashes with salt by Joomla so that even 
people with administrator access cannot view a user's 
password. 

ADDRESSING PROBLEM 2: Openmods IN 
UNDERGRADUATE BIOINFORMATICS EDUCATION 

For the past 5 years, we have taught an introductory bio- 
informatics class designed for upper-division undergradu- 
ate students with a strong background in molecular 
biology but limited computer skills. Its purpose is to 
present a subset of the fundamental file formats, algo- 
rithms and databases in sufficient detail that the students 
are able not only to identify and apply them, but also to 
make informed and critical interpretations. Our curricu- 
lum has lecture and computer laboratory components that 
teach genome annotation, and students must then apply 
this knowledge in interactive research. Each lecture 
provides background for the subsequent laboratory, 
which ultimately leads to a final project where students 
contribute to the re-annotation of a bacterial genome 
(usually, but not always M. xanthus). Lecture topics and 
the laboratory exercises for 2011 are included in 
Supplementary Data. 

We introduced a re-annotation exercise into the course 
using Xanthusbase in 2009. In 2011 we greatly expanded 
the role of Xanthusbase, due both to curriculum develop- 
ment and the aforementioned significant software 
redesign. For the final project, students were assigned ap- 
proximately 50 ORFs from the M. xanthus genome and 
asked to perform a thorough and documented 
re-annotation using Xanthusbase. 

The quality of the 2011 student annotations exceeded 
our expectations, thus improving the annotation for 136 
ORF's when compared to the current M. xanthus 
Genbank file (http://www.ncbi.nlm.nih.gov/nuccore/ 
108460647). Approximately 65% of the contributions 
were approved as being meaningful, with the most pro- 
ductive student improving the annotation of all 50 reading 
frames she was assigned. All contributions remain on 
Xanthusbase and can be viewed by anyone (Select 
Organism / Myxococcus xanthus / Recent Annotations). 

ADDRESSING PROBLEM 3: Openmods AS A 
WIKI-STYLE CIR MOD FOR THE RESEARCH 
COMMUNITY 

The expected area of impact where we have encountered 
the most challenges has been in engaging the M. xanthus 
research community in active, collaborative genome anno- 
tation through Xanthusbase. While certainly some 



researchers and students have made meaningful contribu- 
tions, there have been far fewer than was originally envi- 
sioned. We believe that this does not reflect negatively on 
Openmods, which has proven itself over the past 5 years to 
be a reliable MOD and an extremely useful teaching tool, 
and we certainly do not believe it reflects negatively on the 
community of M. xanthus researchers, who represent a 
truly dedicated and collaborative group. Wiki-style data- 
bases have proven to be a highly effective means of col- 
lecting and organizing information under certain 
circumstances, but they are not always the best option. 

The barriers to collaborative genome annotation have 
been discussed, and these remain unchanged since life 
science wikis began to proliferate. Although there has 
been no rigorous examination as to why academic re- 
searchers do not embrace wiki-style CIRs, there are 
reasons that seem obvious to an inside observer. 
Academic research takes place within an established incen- 
tive structure defined by a series of goals: a tenure-track 
job, research funding, tenure and promotion. Contrib- 
utions to a wiki-style CIR do not move a researcher 
closer to these goals. In fact, such contributions may 
actually function as a hindrance, since data contributed 
to a wiki-style CIR might be considered published and 
therefore ineligible for subsequent publication in a peer- 
reviewed journal. Even if that were not the case, explaining 
your ideas pre-publication in an online forum could be 
viewed as potentially helping your competitors. Hopefully, 
the academic incentive structure will evolve to incorpor- 
ate new approaches. For example, Prof. Michel Aaij 
at Auburn University Montgomery recently used his 
more than 60000 contributions to Wikipedia as part 
of his tenure portfolio (http://blog.wikimedia.org/2011/ 
04/06/tenure-awarded-based-in-part-on-wikipedia-contrib 
utions), but at present he represents an extreme outlier. 

Although Pis may not have an adequate incentive to 
contribute to a wiki-style CIR MOD, perhaps graduate 
students and post-doctoral fellows would contribute? 
Since they are already experts, it could be argued that 
contributing would require little additional effort. 
Unfortunately, in a study that analyzed the effect of 
wikis on student writing, Wheeler et al. (16) found that 
students who contributed to wikis were cognizant that 
their contributions lacked anonymity, and would be read 
by peers, and that this caused them to spend more time 
and write more carefully than usual for their assignments. 
The same is likely to apply to graduate students and 
post-doctoral fellows contributing to a wiki-style CIR 
MOD that might be read by their peers, PI and other 
members of the scientific community. Perhaps because 
of this, they would feel the need to devote consider- 
able time and effort to their contributions, and thus 
may arrive at the conclusion that this time and effort 
are better spent writing a manuscript or generating more 
data. 

Unlike graduate students, there is a strong incentive for 
undergraduate students to make contributions to a 
wiki-style CIR MOD: they want to earn a good grade in 
class. They are not yet experts in the model system or the 
process of annotation, however, and so their contributions 
vary in quality. This is apparent in the recent 
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contributions from our 2011 bioinformatics class; some 
provide real insight into the annotation of an ORF, but 
others add little information. Any database built on Wiki 
principles relies on user contributions to be accurate and 
thorough. If Xanthusbase is to succeed, we must continue 
working toward a sustainable community-driven model. 
In this way, Xanthusbase is an ongoing experiment in sci- 
entific social networking. Having expanded to other model 
organisms and research communities with Openmods, we 
will try to learn from each other over the next 5 years. We 
continue to look for new participants. 



CONCLUSIONS: Openmods 5- YEAR PLAN 

Openmods software will be continually updated in the 
next 5 years. We plan to create Openmods installations 
for new model organisms, and to support organisms 
with multiple chromosomes and also chromosomes that 
are not circular. One PI (S. Crosson, personal communi- 
cation) has requested that we allow changing the coordin- 
ates of ORFs and also removing and adding ORFs. We 
plan to add this functionality. We will also introduce a 
feature that allows users of an Openmods installation to 
vote on the importance of an annotation (similar to the 
voting in stackoverflow.com), and show records of users 
by the number of votes they have accumulated. We plan to 
work with the Debian open source community to make 
the Openmods package available in the Debian 
repositories. This will make installation easier, and we 
will be able to push updates to Openmods installations 
automatically, allowing us to more easily implement new 
community feature requests and respond to security 
defects that require patches. We are also looking into 
using web frameworks that feature unit testing. This will 
allow us to more readily accept contributions from open 
source developers because it will allow us to ensure that 
contributions from unknown developers are of sufficiently 
high quality. 

We also plan to expand the use of Openmods as an 
undergraduate education tool. We are seeking other Pi's 
outside of our current set to incorporate Openmods into 
their curriculum. Moreover, we envision a collaborative 
environment among each model organism's group where 
the community will act as ad hoc referees for the under- 
graduate contributions. This will not only ensure that 
high-quality contributions become fixed in the 
database, but will teach necessary skills to the next gen- 
eration of life scientists. The benefits of introducing sub- 
stantive research projects into the classroom are real 
(17). Undergraduate involvement in research can 
improve retention in the sciences (18) and the pursuit 
of a graduate degree (19). In a 2004 study of 1135 
undergraduates from 41 universities, 91% of students 
reported that their research experience sustained or 
increased their interest in attending graduate school 
(20). Based on our experience with the success and 
continued challenges of Openmods, it seems a natural 
progression to encourage more undergraduate participa- 
tion in genome annotation. 



SUPPLEMENTARY DATA 

Supplementary Data are available at NAR Online. 
Supplementary Lab 1-7. 
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